AITopics | ranking method

Collaborating Authors

ranking method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

cd305fdee96836d5cc1de94577d71b61-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 00:03:12 GMT

artificial intelligence, generalization, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Waltham (0.04)
(3 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales

Godfrey, Charles, Nie, Ping, Ostapuk, Natalia, Ken, David, Gao, Shang, Inati, Souheil

arXiv.org Artificial IntelligenceMay-27-2025

Large language models (LLMs) obtain state of the art zero shot relevance ranking performance on a variety of information retrieval tasks. The two most common prompts to elicit LLM relevance judgments are pointwise scoring (a.k.a. relevance generation), where the LLM sees a single query-document pair and outputs a single relevance score, and listwise ranking (a.k.a. permutation generation), where the LLM sees a query and a list of documents and outputs a permutation, sorting the documents in decreasing order of relevance. The current research community consensus is that listwise ranking yields superior performance, and significant research effort has been devoted to crafting LLM listwise ranking algorithms. The underlying hypothesis is that LLMs are better at making relative relevance judgments than absolute ones. In tension with this hypothesis, we find that the gap between pointwise scoring and listwise ranking shrinks when pointwise scoring is implemented using a sufficiently large ordinal relevance label space, becoming statistically insignificant for many LLM-benchmark dataset combinations (where ``significant'' means ``95\% confidence that listwise ranking improves NDCG@10''). Our evaluations span four LLMs, eight benchmark datasets from the BEIR and TREC-DL suites, and two proprietary datasets with relevance labels collected after the training cut-off of all LLMs evaluated.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.19334

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bayesian Active Learning for Multi-Criteria Comparative Judgement in Educational Assessment

Gray, Andy, Rahat, Alma, Crick, Tom, Lindsay, Stephen

arXiv.org Machine LearningMar-1-2025

Comparative Judgement (CJ) provides an alternative assessment approach by evaluating work holistically rather than breaking it into discrete criteria. This method leverages human ability to make nuanced comparisons, yielding more reliable and valid assessments. CJ aligns with real-world evaluations, where overall quality emerges from the interplay of various elements. However, rubrics remain widely used in education, offering structured criteria for grading and detailed feedback. This creates a gap between CJ's holistic ranking and the need for criterion-based performance breakdowns. This paper addresses this gap using a Bayesian approach. We build on Bayesian CJ (BCJ) by Gray et al., which directly models preferences instead of using likelihoods over total scores, allowing for expected ranks with uncertainty estimation. Their entropy-based active learning method selects the most informative pairwise comparisons for assessors. We extend BCJ to handle multiple independent learning outcome (LO) components, defined by a rubric, enabling both holistic and component-wise predictive rankings with uncertainty estimates. Additionally, we propose a method to aggregate entropies and identify the most informative comparison for assessors. Experiments on synthetic and real data demonstrate our method's effectiveness. Finally, we address a key limitation of BCJ, which is the inability to quantify assessor agreement. We show how to derive agreement levels, enhancing transparency in assessment.

entropy pair selector, multi-criteria comparative judgement, ranking method, (8 more...)

arXiv.org Machine Learning

2503.00479

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Oceania > New Zealand (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting > K-12 Education (0.67)
Education > Educational Setting > Online (0.67)
Education > Assessment & Standards > Student Performance (0.46)
Education > Educational Technology > Educational Software (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

OrdRankBen: A Novel Ranking Benchmark for Ordinal Relevance in NLP

Wang, Yan, Qian, Lingfei, Peng, Xueqing, Huang, Jimin, Feng, Dongji

arXiv.org Artificial IntelligenceMar-1-2025

The evaluation of ranking tasks remains a significant challenge in natural language processing (NLP), particularly due to the lack of direct labels for results in real-world scenarios. Benchmark datasets play a crucial role in providing standardized testbeds that ensure fair comparisons, enhance reproducibility, and enable progress tracking, facilitating rigorous assessment and continuous improvement of ranking models. Existing NLP ranking benchmarks typically use binary relevance labels or continuous relevance scores, neglecting ordinal relevance scores. However, binary labels oversimplify relevance distinctions, while continuous scores lack a clear ordinal structure, making it challenging to capture nuanced ranking differences effectively. To address these challenges, we introduce OrdRankBen, a novel benchmark designed to capture multi-granularity relevance distinctions. Unlike conventional benchmarks, OrdRankBen incorporates structured ordinal labels, enabling more precise ranking evaluations. Given the absence of suitable datasets for ordinal relevance ranking in NLP, we constructed two datasets with distinct ordinal label distributions. We further evaluate various models for three model types, ranking-based language models, general large language models, and ranking-focused large language models on these datasets. Experimental results show that ordinal relevance modeling provides a more precise evaluation of ranking models, improving their ability to distinguish multi-granularity differences among ranked items-crucial for tasks that demand fine-grained relevance differentiation.

ranking task, relevance label, relevance score, (12 more...)

arXiv.org Artificial Intelligence

2503.00674

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Minnesota (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Learning when to rank: Estimation of partial rankings from sparse, noisy comparisons

Morel-Balbi, Sebastian, Kirkley, Alec

arXiv.org Machine LearningJan-5-2025

A common task arising in various domains is that of ranking items based on the outcomes of pairwise comparisons, from ranking players and teams in sports to ranking products or brands in marketing studies and recommendation systems. Statistical inference-based methods such as the Bradley-Terry model, which extract rankings based on an underlying generative model of the comparison outcomes, have emerged as flexible and powerful tools to tackle the task of ranking in empirical data. In situations with limited and/or noisy comparisons, it is often challenging to confidently distinguish the performance of different items based on the evidence available in the data. However, existing inference-based ranking methods overwhelmingly choose to assign each item to a unique rank or score, suggesting a meaningful distinction when there is none. Here, we address this problem by developing a principled Bayesian methodology for learning partial rankings -- rankings with ties -- that distinguishes among the ranks of different items only when there is sufficient evidence available in the data. Our framework is adaptable to any statistical ranking method in which the outcomes of pairwise observations depend on the ranks or scores of the items being compared. We develop a fast agglomerative algorithm to perform Maximum A Posteriori (MAP) inference of partial rankings under our framework and examine the performance of our method on a variety of real and synthetic network datasets, finding that it frequently gives a more parsimonious summary of the data than traditional ranking, particularly when observations are sparse.

artificial intelligence, machine learning, ranking, (17 more...)

arXiv.org Machine Learning

2501.02505

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Soccer (0.68)
Education (0.67)
Leisure & Entertainment > Games > Chess (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

A Novel Pareto-optimal Ranking Method for Comparing Multi-objective Optimization Algorithms

Ibrahim, Amin, Bidgoli, Azam Asilian, Rahnamayan, Shahryar, Deb, Kalyanmoy

arXiv.org Artificial IntelligenceNov-26-2024

As the interest in multi- and many-objective optimization algorithms grows, the performance comparison of these algorithms becomes increasingly important. A large number of performance indicators for multi-objective optimization algorithms have been introduced, each of which evaluates these algorithms based on a certain aspect. Therefore, assessing the quality of multi-objective results using multiple indicators is essential to guarantee that the evaluation considers all quality perspectives. This paper proposes a novel multi-metric comparison method to rank the performance of multi-/ many-objective optimization algorithms based on a set of performance indicators. We utilize the Pareto optimality concept (i.e., non-dominated sorting algorithm) to create the rank levels of algorithms by simultaneously considering multiple performance indicators as criteria/objectives. As a result, four different techniques are proposed to rank algorithms based on their contribution at each Pareto level. This method allows researchers to utilize a set of existing/newly developed performance metrics to adequately assess/rank multi-/many-objective algorithms. The proposed methods are scalable and can accommodate in its comprehensive scheme any newly introduced metric. The method was applied to rank 10 competing algorithms in the 2018 CEC competition solving 15 many-objective test problems. The Pareto-optimal ranking was conducted based on 10 well-known multi-objective performance indicators and the results were compared to the final ranks reported by the competition, which were based on the inverted generational distance (IGD) and hypervolume indicator (HV) measures. The techniques suggested in this paper have broad applications in science and engineering, particularly in areas where multiple metrics are used for comparisons. Examples include machine learning and data mining.

algorithm, artificial intelligence, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2411.17999

Country:

North America > United States > Michigan > Ingham County > Lansing (0.04)
North America > United States > Michigan > Ingham County > East Lansing (0.04)
North America > United States > Massachusetts (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Fast solution to the fair ranking problem using the Sinkhorn algorithm

Uehara, Yuki, Ikeda, Shunnosuke, Nishimura, Naoki, Ohashi, Koya, Li, Yilin, Yang, Jie, Jobson, Deddy, Zha, Xingxia, Matsumoto, Takeshi, Sukegawa, Noriyoshi, Takano, Yuichi

arXiv.org Artificial IntelligenceJun-10-2024

In two-sided marketplaces such as online flea markets, recommender systems for providing consumers with personalized item rankings play a key role in promoting transactions between providers and consumers. Meanwhile, two-sided marketplaces face the problem of balancing consumer satisfaction and fairness among items to stimulate activity of item providers. Saito and Joachims (2022) devised an impact-based fair ranking method for maximizing the Nash social welfare based on fair division; however, this method, which requires solving a large-scale constrained nonlinear optimization problem, is very difficult to apply to practical-scale recommender systems. We thus propose a fast solution to the impact-based fair ranking problem. We first transform the fair ranking problem into an unconstrained optimization problem and then design a gradient ascent method that repeatedly executes the Sinkhorn algorithm. Experimental results demonstrate that our algorithm provides fair rankings of high quality and is about 1000 times faster than application of commercial optimization software.

algorithm, fair ranking problem, sinkhorn algorithm, (15 more...)

arXiv.org Artificial Intelligence

2406.10262

Country:

Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.06)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
North America > United States (0.04)

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.57)

Add feedback

BIRP: Bitcoin Information Retrieval Prediction Model Based on Multimodal Pattern Matching

Kim, Minsuk, Kim, Byungchul, Yong, Junyeong, Park, Jeongwoo, Kim, Gyeongmin

arXiv.org Artificial IntelligenceAug-14-2023

Financial time series have historically been assumed to be a martingale process under the Random Walk hypothesis. Instead of making investment decisions using the raw prices alone, various multimodal pattern matching algorithms have been developed to help detect subtly hidden repeatable patterns within the financial market. Many of the chart-based pattern matching tools only retrieve similar past chart (PC) patterns given the current chart (CC) pattern, and leaves the entire interpretive and predictive analysis, thus ultimately the final investment decision, to the investors. In this paper, we propose an approach of ranking similar PC movements given the CC information and show that exploiting this as additional features improves the directional prediction capacity of our model. We apply our ranking and directional prediction modeling methodologies on Bitcoin due to its highly volatile prices that make it challenging to predict its future movements.

data mining, machine learning, pattern recognition, (16 more...)

arXiv.org Artificial Intelligence

2308.08558

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.05)
(4 more...)

Genre: Research Report (0.40)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.83)

Add feedback

Demonstrating Large-Scale Package Manipulation via Learned Metrics of Pick Success

Li, Shuai, Keipour, Azarakhsh, Jamieson, Kevin, Hudson, Nicolas, Swan, Charles, Bekris, Kostas

arXiv.org Artificial IntelligenceJun-27-2023

Automating warehouse operations can reduce logistics overhead costs, ultimately driving down the final price for consumers, increasing the speed of delivery, and enhancing the resiliency to workforce fluctuations. The past few years have seen increased interest in automating such repeated tasks but mostly in controlled settings. Tasks such as picking objects from unstructured, cluttered piles have only recently become robust enough for large-scale deployment with minimal human intervention. This paper demonstrates a large-scale package manipulation from unstructured piles in Amazon Robotics' Robot Induction (Robin) fleet, which utilizes a pick success predictor trained on real production data. Specifically, the system was trained on over 394K picks. It is used for singulating up to 5 million packages per day and has manipulated over 200 million packages during this paper's evaluation period. The developed learned pick quality measure ranks various pick alternatives in real-time and prioritizes the most promising ones for execution. The pick success predictor aims to estimate from prior experience the success probability of a desired pick by the deployed industrial robotic arms in cluttered scenes containing deformable and rigid objects with partially known properties. It is a shallow machine learning model, which allows us to evaluate which features are most important for the prediction. An online pick ranker leverages the learned success predictor to prioritize the most promising picks for the robotic arm, which are then assessed for collision avoidance. This learned ranking process is demonstrated to overcome the limitations and outperform the performance of manually engineered and heuristic alternatives. To the best of the authors' knowledge, this paper presents the first large-scale deployment of learned pick quality estimation methods in a real production system.

artificial intelligence, machine learning, suction cup, (19 more...)

arXiv.org Artificial Intelligence

2305.10272

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Asia > South Korea > Daegu > Daegu (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology (0.88)
Transportation (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Filters

Collaborating Authors

ranking method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

48db71587df6c7c442e5b76cc723169a-Paper.pdf

cd305fdee96836d5cc1de94577d71b61-Paper-Conference.pdf

Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales

Bayesian Active Learning for Multi-Criteria Comparative Judgement in Educational Assessment

OrdRankBen: A Novel Ranking Benchmark for Ordinal Relevance in NLP

Learning when to rank: Estimation of partial rankings from sparse, noisy comparisons

A Novel Pareto-optimal Ranking Method for Comparing Multi-objective Optimization Algorithms

Fast solution to the fair ranking problem using the Sinkhorn algorithm

BIRP: Bitcoin Information Retrieval Prediction Model Based on Multimodal Pattern Matching

Demonstrating Large-Scale Package Manipulation via Learned Metrics of Pick Success